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(57) ABSTRACT 

The invention relates to an encoding method for the com- 
pression of a video sequence divided into groups of frames 
decomposed by means of a tridimensional wavelet trans- 
form. According to this method, based on the hierarchical 
subband encoding process SPIHT and applied to the band- 
pass subbands of a spatio-temporal orientation tree defining 
the spatio-temporal relationship within the hierarchical 
pyramid of the obtained transform coefficients, a vectorial 
DPCM, using either constant prediction coefficients or adap- 
tive ones for taking into account scene changes, is used to 
separately encode the lowest frequency spatio-temporal 
subband, and the quantification of the prediction error 
observed when constructing a spatio-temporal predictor for 
each vector of transform coefficients having components in 
each frame of said subband is carried out by means of a 
scalar or vectorial quantization. The final binary stream 
resulting from these modulation and quantification steps is 
encoded by a lossless technique minimising the entropy of 
the whole message. 

6 Claims, 6 Drawing Sheets 
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ENCODING METHOD FOR THE frames. At the last temporal decomposition level, there are 

COMPRESSION OF A VIDEO SEQUENCE two frames in the lowest temporal subband. In each frame of 

the temporal subbands, a spatial decomposition is per- 
formed. In this framework, subband coding the three- 

FIELD OF THE INVENTION 5 dimensional structure of data can be realized as an extension 

The present invention relates to an encoding method for °^ ^ e s P at ^ SUD b an d coding techniques, 

the compression of a video sequence divided in groups of 0ne of ^ most Informant wavelet-based scheme for 

frames decomposed by means of a tridimensional (3D) ima & e compression, which was recently extended to the 3D 

wavelet transform leading to a given number of successive structure of subbands is the bidimensional set partitioning in 

resolution levels, said method being based on a hierarchical 10 hierarchical trees, or 2D SPIHT, described in the document 

subband encoding process called "set partitioning in hierar- " A ncw fast > and efficient image codec based on set parti- 

chical trees" (SPIHT) and leading from the original set of Zoning in hierarchical trees", by A. Said and W. A. 

picture elements (pixels) of each group of frames to trans- Pearlman, IEEE Transactions on Circuits and Systems for 

form coefficients encoded with a binary format and consti- video Technology, vol.6, N° 3, June 1996, pp.243-250. The 

tuting a hierarchical pyramid, said coefficients being ordered 15 basic concepts ™*<* in this 3D coding technique are the 

by means of magnitude tests involving the pixels repre- following: spatio-temporal trees corresponding to the same 

sented by three ordered lists called list of insignificant sets location are formed in the wavelet domain; then, the wavelet 

(US), list of insignificant pixels (LIP) and list of significant transform coefficients in these trees are partitioned into sets 

pixels (LSP), said tests being carried out in order to divide defined by the level of the highest significant bit in a 

said original set of picture elements into partitioning subsets 20 bit-plane representation of their magnitudes; finally, the 

according to a division process that continues until each highest remaining bit planes are coded and the resulting bits 

significant coefficient is encoded within said binary transmitted. 

representation, and a spatio-temporal orientation tree — in A common characteristic of the SPIHT algorithm pre- 

whicb the roots are formed with the pixels of the approxi- sented above, as well in its 2D as in its 3D version is that the 

mation subband resulting from the 3D wavelet transform 25 spatial, respectively the spatio-temporal, orientation trees 

and the ofifepring of each of these pixels is formed with the are defined beginning with the lowest frequency subband, 

pixels of the higher subbands corresponding to the image and represent the coefficients related to the same spatial, or 

volume defined by these root pixels — defining the spatio- spatio-temporal, location. This way, with the exception of 

temporal relationship inside said hierarchical pyramid. the lowest frequency band, all parents have four (in 2D) or 

30 eight (in 3D) children. Let (i,j,k) represent the coordinates of 

BACKGROUND OF THE INVENTION a picture element (pixel) in the 3D transform domain: if it is 

In video compression schemes, the reduction of temporal not in the lowe f spatio-temporal frequency subband and it 

redundancy is mainly achieved by two types of approaches. * DOt m ODe of ? e last level subbands, then its 

According to the first one, the so-called "hybrid" or predic- 35 offis P rm 8 s have the coordinates: 

live approach, a prediction of the current frame is computed 0-{(2i,2j,2k), (2i+l,2j,2k), (2i,2j+l,2k), (2i,2j,2k+l), 

based on the previously transmitted frames, and only the (2i+l,2j+l,2k), (2i+l,2j,2k+l), (2i,2j+l,2k+l), (2i+l, 

prediction error is intra-coded and transmitted. In the second 2j+l,2k+l)}. 

one, the temporal redundancy is exploited by means of a For the of simplicity, the still picture case is illustrated 

temporal transform, which is similar to spatial techniques 40 m 2 (subbands s-LLLL, s-LLLH, etc ... ). 

for removing redundancies. In this last technique, called the In tne image coding domain, compression algorithms by 

3D or 2D+t approach, the sequence of frames is processed zerotrees were extensively studied in the last years and 

as a 3D volume, and the subband decomposition used in several improvements have been proposed. For example, in 

image coding is extended to 3D spatio-temporal data by the MPEG-4 standard, a variant of such an algorithm (see for 

using separable transforms (for example, wavelet or wavelet 45 ^stance the document "Embedded image coding using 

packets transforms implemented by means of filter banks). zerotrees of wavelet coefficients", by J. M. Shapiro, IEEE 

The anisotropy in the 3D structure can be taken into account Transactions on Signal Processing, vol. 41, N° 12, Decem- 

by using different filter banks in the temporal and spatial ber 1993, pp.3445-3462) was adopted for the still picture 

directions (Haar filters are usually chosen for temporal coding mode, in which the lowest spatial subband is inde- 

filtering since the added delay observed with longer filters is 50 P endentlv coded usm g a DPCM technique. Subsequently, 

undesirable; furthermore, Haar filters, which are two-tap spatial orientation trees are formed starting in the detail 

filters are the only perfect reconstruction orthogonal filters subbands (all subbands except s-LLLL, the first one), which 

which do not present the boundaries effect). is illustrated in FIG. 3. 

It was observed that the coding efficiency of the 3D SUMMARY OF THE INVENTION 
coding scheme can be improved by performing motion 55 

estimation/compensation in the low temporal subbands, at 11 is an object of the invention to propose a new type of 

each level of the temporal decomposition. Therefore, the vide0 encoding method, in the 3D case, 

present scheme includes motion estimation/compensation To this end, the invention relates to an encoding method 

inside subbands and the 3D subband decomposition is such as defined in the introductive paragraph and which is 

applied on the compensated group of frames. An entire 60 moreover characterized in that: 

three-stage temporal decomposition is described in FIG. 1. (A) a vectorial differential pulse code modulation 

Each group of frames in the input video sequence must (DPCM) is used to separately encode the lowest fre- 

contain a number of frames equal to a power of two (usually, quency spatio-temporal subband, or approximation 

16, in the present example, 8). The rectilinear arrows indi- subband, according to the following conditions: 

cate the low-pass (L) temporal filtering (continuous arrows) 65 (a) a spatio-temporal predictor, using not only values at 

and the high-pass (H) one (dotted arrows), and the curved the same location in past frames of the video 

ones designate the motion compensation between two sequence but also neighbouring values in the current 
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frame, is constructed for each vector of coefficients FIG. 8 illustrates a block diagram of the encoding method 
having components in each frame of the approxima- according to the invention, 
tion subband, said vectorial coding feature coming nc™.T cn t^cc^didtt^tvt nc tut? 
from the fact that the lowest frequency subband DETAILED DESCRJ^TION OF THE 
contains spatial low frequency subbands from at 5 IWENTION 
least two frames; With the 3D video scheme here proposed, the lowest 
(b) said DPCM uses constant prediction coefficients; frequency subband of the 3D spatio-temporal decomposition 
(B) the Quantification of the prediction error is carried out is independently coded, while the other subbands are 
by means of a scalar quantization of the two vector encoded using the 3D SPIHT algorithm. This implies how- 
components, followed by an assignment of a unique 1Q eve r some important modifications. One will focus here on 
binary code associated to the probability computed for me 3D structure, which is of interest for the invention, 
each given couple of quantized values; fa ^ ^ fl spccific featurc of ^ spiHT algorithni) a 
(Q the binary stream resulting from the steps (A) and (B) 5lock ^ of which is iUustrated in FIG. 4, will be 
is encoded by a lossless process minimizing the entropy recalled Fof the lowest spatio . temporal subband) lhe parent . 
of the who e message. In another embodiment, the ^ relationshi are defined as foIlows: 51ocks of 8 
mvention relates to a similar method, but characterized ^ £ ^p^g two pixels ^ each 

/AX " . . . , , , ■ direction. Their offsprings are defined as the groups of 8 

(A) a vectorial differential pulse code modulation . , j* * .u 1 *u n j- 
mnpxi\ • j .i a ii_ 1 c pixels corresponding to the same location in the 7 adjacent 
(DPCM) is used to separately encode the lowest fre- ^ . ■, UL j a • 1 • .u . 0 , ^ ■ 

v ' A . A « . r_ « . . detail subbands. One pixel in the group of 8 has no offspring, 

quency spatio-temporal subband, or approximation tl _ • 1 . . , , £ fi . , 5 

it j j« * *l r 11 • X- 20 while every other pixels have a block of 8 pixels as off- 

subband, according to the following conditions: . je J j . . w kt ^ *u j- • r *u 

/v & , , sprmgs. If one denotes by M, N, T the dimensions of the 

W 1 a spatio-temporal predictor using not only values at ^ Qf f and 

one considers J decomposition 

the same location in past frames of the video level the /, he dimension of me lowest fr subband 

sequence but also neighbouring values in the current . KA ,~j ^ KT /V ^ t^>/ -ru te ■ e \u * 

e M . . , % u * * us • 4 18 Mj=M/2% N,=N/2% T..=T/2\ The offsprings of the coef- 

frame, is constructed for each vector of coefficients a . « , ♦ j * /« ■ 1 \ • .u 1 * t . 

, . . . . , fil _ 25 ncient located at (1,1 Jc) in the lowest frequency subband are: 

having components in each frame of the approxima- _ f/ . „ . ' , < „, x m l . / „ , _ _ 

lion subband, said vectorial coding feature coming 0= i ( '"^ ^7 7 J ?i ^W'/^-i*^* 

from the fact that the lowest frequency subband Of + ^J+ Njjc-l N^-l+T^) (l-l+M^, 

contains spatial low frequency subbands from at ]l\ *fW»?^ O+M^-l+N^+T,), (i-l+M^+N^ 

least two frames; _ T,), ( 1+ M,j+N^Jc+T,)} 

(b) said DPCM uses constant prediction coefficients; 30 ^ rees a J e merefore f°™e° talang as roots the pixeb ,n the 

/n\,u ,m= r lL j • . ■ j lowest frequency subband. This technique used in 3D video 

(B) the quantification of the prediction error is earned out i • i * j r ^ • r 
v f f , t 4-** 1 coding was also implemented for the compression of 3D 

by means of a vectorial quantization using an optimal j- i • u . • .u- *u j 

n ; 1M .. „ « i» j t I j » # | y . 4U medical images, but in this case the motion estimation and 

quantizer based on a generalized Lloyd-Max algorithm, .1 i • , 

• ' 4. t i • \ i_.f* j •/ rL r . * compensation stage was skipped, 

a jomt Laplacan probabthty density function for the J5 ^ framew ^ rk) tbe mo *difi cation here proposcd (FIG . 

nvo component of the quannzed prediction error vec 3D splHT • to independendy 

tor bemg considered for said optimization; 7 , T 4 . v , 7 , 4 T T ^ T T T / 

A . . . . . , r , ' A % , encode the lowest spatio-temporal subband t-LL-s-LLLL. 

(C) the binary stream resulting trom the steps (A) and (B) M ±]s subband 

contains the lowest spatial frequency sub- 

is encoded by a lossless process minimizing the entropy bands of ^ tWQ frames fa ^ lowest temporal subban(J) ^ 

o e wbole message. , . 40 information in this band can be seen as vectorial infonna- 

wnatever tne emDoaiment said UfUM may also be , ion . ixels ^ , he 

same indexes in the two spatial sub- 

adaptive, the coefficients of the spatio-temporal predictor bands m d fato vectofs which wiu ^ (he same 

now taking into account scene changes by means of a least mdex TOs is iIlustrated m FIG . 7 for the two frames 

means squares estimation of these coefficients for each contained fa the , owest , emporal subband and fa parliculaf 

group o rames. 45 ^ 0f ^ e 2 owest spatial frequency subband in these frames. Id 

BRIEF DESCRIPTION OF THE DRAWINGS order to compress this information, it is proposed to use a 

Hie particularities and advantages of the invention will vectorial adaptive DPCM (differential pulse code 

now be explained with reference to the following embodi- modulation) technique (it is clear that separately coding the 

ment described hereinafter and considered in connection ^° frames would result in lower performances), 

with the drawings in which* 50 ^ zerotree coding by set partitioning in hierarchical 

FIG. 1 illustrates the temporal subband decomposition of !T ecs * 1 f scd °^ [ or the encoding of the detail subbands. 

a group of 8 frames of the input video sequence in a °° e c jf°f £ ^observation that ^if a gavelet, ^efficient m 

tridimensional subband decomposition with motion com- a . m S b ! cve / ^ pyramid is ir^igmficant with respect to a 

pensation- given thresnold » tnen a11 me coefiScients corresponding to the 

^ \ • ^ enrrr-r • 55 same spatio-temporal location in lower levels of the pyramid 

FIG. 2 shows spatial orientation trees in 2D-SPIHT,m the OM .*\ , 4 4 < u - *u u u 

still picture case- insignificant with respect to this threshold. 

^ , r Therefore, all these coefficients can be efficiently encoded 

FIG. 3 shows MPEG-4 like spatial orientation trees for ^ a single symbol> calle(J a zerotree rool A 

bidimensional zerotree coding (s-LLLL is coded separately); coefficient is called significant with respect to a threshold if 

FIG. 4 illustrates a block diagram of the known SPIHT 60 its absolute value is greater than the threshold, and insig- 

algorithm; nificant otherwise. For the transmission, the wavelet coef- 

FIGS. 5 and 6 show respectively spatio-temporal orien- ficients are ranked according to their binary representation 

tation trees in 3D-SP1HT and modified spatio-temporal and the most significant bits are sent first. 

orientation trees; The vectorial adaptive DPCM technique used to encode 

FIG. 7 shows the pixels used for constructing the spatio- 65 the lowest spatio-temporal frequency subband will be now 

temporal predictor in the vectorial DPCM coding of the two described. To this end, one denotes by (i j) the coordinates 

frames in the lowest subband; of the current pixel in the lowest frequency subband and by 
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x»y Yiv tne v ^ues of the coefficients at this index in the first, 
respectively second frame of the lowest temporal subband 
(see FIG. 7). A linear spatio-temporal predictor for the 
vector 



■a 



is constructed based on the following equation, with (n, m) 

eA: 



(1) 



where: 



are the nearest neighbours of: 



-CI 



represents the predictor of 



and 



p _ ( a nju Kjn \ 
V c njn d Rjn ) 

are the matrices of the predicition coefficients. 
For example, in reference to FIG. 7, one has: 

where P u =\ , />,. 0 = , P 0 ,i = 



In Equation (2), the coefiScients a 1(1> a J>0 ,a 0jl realize a spatial 
prediction in frame 1, d^jdj^dg 2 form a spatial prediction 
in frame 2, while the coefficients denoted by b 119 b lt0 ,b Q1 
and c l lt c 10i c 01 correspond to spatio-temporal predictions. 
In Equation (1), fixed prediction coefficients may be used. 

Another possible implementation is to find the optimum 
of the prediction coefficients for each group of frames, based 
on minimizing the mean square error of the prediction error. 
This is an adaptive strategy, and better results than in the 
fixed case are obtained, at the expense of the computational 
complexity. The prediction error is the difference between 
the real value of the vector s u and its predicted value s,/. 
The prediction error can be vectorially quantized using an 
optimal quantizer based on a generalized Lloyd -Max algo- 



15 



20 



(2) 



rithm. A simple choice is to consider for the optimization a 
joint Laplacian probability density function for the two 
components of the quantized prediction error vector. 

The implementation chosen here is based oo a scalar 
quantization of the two vector components, followed by the 
assignment of a unique binary code for the couple of 
components. This is possible if for each couple of quantized 
values one computes the probability of this event and 
associates to it a unique binary code, minimizing the entropy 
of the message. A technique for choosing mis code is the 
arithmetic entropy coding, described for example in "Arith- 
metic coding for data compression", I. H. Witten and al., 
Communications of the ACM, June 1987, vol.30, N.6, pp. 
520-540. The global diagram of the proposed video coding 
system is presented in FIG. 8, where it clearly appears that 
only the lowest frequency subband (detected by the test "is 
detail ?") of the 3D spatio-temporal decomposition is inde- 
pendently coded, after a scalar quantization, by means of a 
vectorial entropy coding. 

The other subbands are processed by means of the 3D 
SPIHT algorithm and then entropy coded. These detail 
subbands are then encoded using the concept of zerotrees 
developed in the document "Embedded image coding ..." 
already cited, the main lines of implementation being the 
25 same as defined in the SPIHT algorithm for comparing sets 
of coefficients with decreasing thresholds. The first threshold 
is chosen as a power of two 2"™" such that the maximum 
value of all the wavelet coefficients, say M, is 
2" m "^M^2' w ^ 1 . Wavelet coefficients are compared with 
this threshold following a predefined order, which is known 
at both the encoder and the decoder sides. It is thus not 
necessary to transmit it in the bitstream. For example, with 
the notations in FIG. 6, the scanning order of the spatio- 
temporal subbands could be: t-LL-s-LLLH, t-LL-s-LLHL, 
t-LL-s-LLHH, t-LH-s-LLLL, t-LH-s-LLLH, t-LH-s-LLHL, 
t-LH-s-LLHH, t-LL-s-LLH, t-LL-s-LHL, t-LL-s-LHH, 
t-LH-s-LLH, t-LH-s-LHL, t-LH-s-LHH, and so on. Other 
scanning order of the subbands are possible. Inside each 
subband, a simple solution is to use a rast order of scanning. 
Indeed, other scanning strategies may be implemented, 
corresponding to the priviledged orientation of the details in 
each subband: horizontal for subbands whose last indexing 
letters are LL and LH, vertical scanning for HL and diagonal 
scanning for HH. 

The drawings and their description have illustrated rather 
than limited the invention, and it is clear that numerous 
alternatives may be proposed without falling out of the 
scope of said invention. It must be for instance indicated that 
the invention is not limited by the number and position of the 
neighbouring pixels considered for the spatio-temporal 
predictor, the method used for the motion estimation and 
compensation, the type of linear wavelet transform used for 
the tridimensional analysis and synthesis, or the adaptation 
algorithm allowing to compute the predictor coefficients. 
What is claimed is: 

1. An encoding method for the compression of a video 
sequence divided in groups of frames decomposed by means 
of a tridimensional (3D) wavelet transform leading to a 
given number of successive resolution levels, said method 
being based on a hierarchical subband encoding process 
called "set partitioning in hierarchical trees'* (SPIHT) and 
leading from the original set of picture elements (pixels) of 
each group of frames to transform coefficients encoded with 
a binary format and constituting a hierarchical pyramid, said 
coefficients being ordered by means of magnitude tests 
involving the pixels represented by three ordered lists called 
list of insignificant sets (LIS), list of insignificant pixels 
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(LIP) and list of significant pixels (LSP), said tests being 
carried out in order to divide said original set of picture 
elements into partitioning subsets according to a division 
process that continues until each significant coefficient is 
encoded within said binary representation, and a spatio- 
temporal orientation tree — in which the roots are formed 
with the pixels of the approximation subband resulting from 
the 3D wavelet transform and the offspring of each of these 
pixels is formed with the pixels of the higher subbands 
corresponding to the image volume defined by these root 
pixels— defining the spatio-temporal relationship inside said 
hierarchical pyramid, said method, applied to the band-pass 
subbands of the spatio-temporal tree, being further charac- 
terized in that: 

(A) a vectorial differential pulse code modulation 
(DPCM) is used to separately encode the lowest fre- 
quency spatio-temporal subband, or approximation 
subband, according to the following conditions: 

(a) a spatio-temporal predictor, using not only values at 
the same location in past frames of the video 
sequence but also neighbouring values in the current 
frame, is constructed for each vector of coefficients 
having components in each frame of tbe approxima- 
tion subband, said vectorial coding feature coming 
from the fact that the lowest frequency subband 
contains spatial low frequency subbands from at 
least two frames; 

(b) said DPCM uses constant prediction coefficients; 

(B) the quantification of the prediction error is carried out 
by means of a scalar quantization of the two vector 
components, followed by an assignment of a unique 
binary code associated to the probability computed for 
each given couple of quantized values; 

(C) the binary stream resulting from the steps (A) and (B) 
is encoded by a lossless process minimizing the entropy 
of the whole message. 

2. An encoding method for the compression of a video 
sequence divided in groups of frames decomposed by means 
of a tridimensional (3D) wavelet transform leading to a 
given number of successive resolution levels, said method 
being based on a hierarchical subband encoding process 
called "set partitioning in hierarchical trees" (SPIHT) and 
leading from the original set of picture elements (pixels) of 
each group of frames to transform coefficients encoded with 
a binary format and constituting a hierarchical pyramid, said 
coefficients being ordered by means of magnitude tests 
involving the pixels represented by three ordered lists called 
list of insignificant sets (LIS), list of insignificant pixels 
(LIP) and list of significant pixels (LSP), said tests being 
carried out in order to divide said original set of picture 
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elements into partitioning subsets according to a division 
process that continues until each significant coefficient is 
encoded within said binary representation, and a spatio- 
temporal orientation tree — in which the roots are formed 
with the pixels of the approximation subband resulting from 
the 3D wavelet transform and the offspring of each of these 
pixels is formed with the pixels of the higher subbands 
corresponding to the image volume defined by these root 
pixels— defining the spatio-temporal relationship inside said 
hierarchical pyramid, said method, applied to the band-pass 
subbands of the spatio-temporal tree, being further charac- 
terized in that: 

(A) a vectorial differential pulse code modulation 
(DPCM) is used to separately encode the lowest fre- 
quency spatio-temporal subband, or approximation 
subband, according to the following conditions: 

(a) a spatio-temporal predictor, using not only values at 
the same location in past frames of the video 
sequence but also neighbouring values in the current 
frame, is constructed for each vector of coefficients 
having components in each frame of the approxima- 
tion subband, said vectorial coding feature coming 
from the fact that the lowest frequency subband 
contains spatial low frequency subbands from at 
least two frames; 

(b) said DPCM uses constant prediction coefficients; 

(B) the quantification of the prediction error is carried out 
by means of a vectorial quantization using an optimal 
quantizer based on a generalized Lloyd-Max algorithm, 
a joint Laplacian probability density function for the 
two components of the quantized prediction error vec- 
tor being considered for said optimization; 

(C) the binary stream resulting from the steps (A) and (B) 
is encoded by a lossless process minimizing the entropy 
of the whole message. 

3. An encoding method according to claim 1, in which 
said DPCM becomes adaptive, the coefficients of the spatio- 
temporal predictor now taking into account scene changes 
by means of a least means squares estimation of these 
coefficients for each group of frames. 

4. An encoding method according to claim 3, in which a 
decision is taken about the fact that the predictor is most 
influenced by the spatial prediction or by the temporal one. 

5. An encoding method according to claim 1, in which 
said lossless process is based on arithmetic encoding. 

6. An encoding method according to claim 1, in which 
said lossless process is based on a Huffmann encoding. 
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